1 Introduction (including choice of data, questions), Team description

2 Description of Data

We get the dataset from the Global Terrorism Database (GTD) http://start.umd.edu/gtd/, which is an open-source database including information on terrorist events around the world from 1970 through 2016 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 170,000 cases. The dataset ends in 2016 because GTD collection beyond 2016 is ongoing and the website is updated annually. They expect to release the 2017 data in Summer 2018.

In our project, we divide the whole dataset into two subgroups: the data in 1970 to 1974 and 2011-2016. We would like to carry out data visualizatioin and data analysis on these subgroups individually and compare their performances.

For the dataset from 1970-1974, there are 2670 observations of 135 variables. For the dataset from 2011-2016, there are 2670 observations of 135 variables.

The variables include

library(openxlsx)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggthemes)
library(rworldmap)
## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type :   vignette('rworldmap')
library(tidyverse)
## ─ Attaching packages ───────────────── tidyverse 1.2.1 ─
## ✔ tibble  1.4.2     ✔ purrr   0.2.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## ─ Conflicts ─────────────────── tidyverse_conflicts() ─
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

3 Graphical Analysis of Data Quality

terrorism <- read.xlsx("../data/raw data/globalterrorismdb_0617dist.xlsx")
terrorism1<-terrorism[terrorism$iyear<=1974,]
#write.csv(terrorism1,"terrorism1.csv")
terrorism2<-terrorism[terrorism$iyear>=2012,]
#write.csv(terrorism2,"terrorism2.csv")

3.1 discover what’s in the data file that was exported

## choose variables

3.2 imputations for NULL values

3.3 Outlier Analysis and Treatment

4 Main Analysis (focus on quality of EDA choices / techniques)

4.1 A heatmap of total number of terrorist attacks worldwide

To compare the total number of terrorist attacks worldwide, we draw the following two heatmaps.

a <- rle(sort(terrorism1$country_txt))
b <- data.frame(country_txt=a$values, number_of_torrorism_attacks_from_1970_to_1974 = a$lengths)

gtdMap <- joinCountryData2Map( b, 
                               nameJoinColumn="country_txt", 
                               joinCode="NAME" )
## 67 codes from your data successfully matched countries in the map
## 9 codes from your data failed to match with a country code in the map
## 176 codes from the map weren't represented in your data
mapCountryData( gtdMap, 
                nameColumnToPlot='number_of_torrorism_attacks_from_1970_to_1974',
                catMethod='fixedWidth', 
                numCats=100 )

a <- rle(sort(terrorism2$country_txt))
b <- data.frame(country_txt=a$values, number_of_torrorism_attacks_from_2012_to_2016 = a$lengths)

gtdMap <- joinCountryData2Map( b, 
                               nameJoinColumn="country_txt", 
                               joinCode="NAME" )
## 137 codes from your data successfully matched countries in the map
## 1 codes from your data failed to match with a country code in the map
## 106 codes from the map weren't represented in your data
#mapDevice('x11')
mapCountryData( gtdMap, 
                nameColumnToPlot='number_of_torrorism_attacks_from_2012_to_2016', 
                catMethod='fixedWidth', 
                numCats=100 )

In order to visualize the total number of terrorist attacks throughout the years, we also draw a supplementary line chart to facilitate comparison.

#attack trends 1970-1974
year_1970_1974 <- terrorism1 %>% group_by(iyear) %>% summarise(n=n())
p1<-ggplot(aes(x = iyear, y = n), data = year_1970_1974) + geom_line() + xlab("Year") + ylab("terrorist Attacks Number") + ggtitle("Global Terrorist Attacks From 1970 To 1974") 

#attack trends 2012-2016
year_2012_2016 <- terrorism2 %>% group_by(iyear) %>% summarise(n=n())
p2<-ggplot(aes(x = iyear, y = n), data = year_2012_2016) + geom_line() + xlab("Year") + ylab("terrorist Attacks Number") + ggtitle("Global Terrorist Attacks From 2012 To 2016") 

grid.arrange(p1, p2, nrow = 2)

From the plot above, although the trends are different between these two periods, there has been a noticable increase on terrorist attacks number during the recent years compared with the 70s. During 2012-2016, the maximum number of attacks was around 17000, and the minimum numbre of attacks was around 8500. While during 1970-1974, the maximum number was only 650. This indicates that there have been more unstable factors around the globe in recent years and the world has been more dangerous now than in the 70s.

4.2 Terrorist Attack Tactics Wordwide, 1970-1974; 2012-2016

We then want to explore what kinds of attacks were occured during these years.

#1970-1974
attacktype_1970_1974 <- terrorism1 %>% group_by(attacktype1_txt) %>% summarise(n=n())
p1<-ggplot(aes(x=reorder(attacktype1_txt, n), y=n), data=attacktype_1970_1974) + 
geom_bar(stat = 'identity') + xlab('Attack Type') + ylab('Number of Attacks') + ggtitle('Global Terrorist Attack Types From 1970 To 1974') + coord_flip() 

#2011-2016
attacktype_2012_2016 <- terrorism2 %>% group_by(attacktype1_txt) %>% summarise(n=n())
p2<-ggplot(aes(x=reorder(attacktype1_txt, n), y=n), data=attacktype_2012_2016) + 
geom_bar(stat = 'identity') + xlab('Attack Type') + ylab('Number of Attacks') + ggtitle('Global Terrorist Attack Types From 2011 To 2016') + coord_flip()

grid.arrange(p1, p2, nrow = 2)

From the above bar chart, we can see that Bombing/Explosion has been the most frequently used method of terrorist attack both during 1970-1974 and 2012-2016. Armed Assault, Assassination, Hostage Taking (kidnapping) and Facility/Infrastructure have also been some of the most frequently used terrorist attack method for both period.

4.3 Terrorist Attack Targets in 1970 and 2016

There are a lot of different terrorist attack targets in our dataset, such as Government, Military, Education Institutions and so on. We would like to compare the top 10 terrorist attack targets in the first 1970 and the last year 2016 in our dataset. The variation in terrorist attack targets may be related to the policy issues, motivations and religions, etc.

attack1970 <- terrorism[terrorism$iyear==1970, ]
by_target <- attack1970 %>% group_by(targtype1_txt) %>% 
  summarise(n=n())
by_target <- arrange(by_target, desc(n))%>% head(10)

p1<-ggplot(aes(x=reorder(targtype1_txt, n), y=n), data=by_target) +
  geom_bar(stat = 'identity') + ggtitle('Attack Targets, 1970') +
  coord_flip() + theme_fivethirtyeight()

attack2016 <- terrorism[terrorism$iyear==2016, ]
by_target <- attack2016 %>% group_by(targtype1_txt) %>% 
  summarise(n=n())
by_target <- arrange(by_target, desc(n))%>% head(10)

p2<-ggplot(aes(x=reorder(targtype1_txt, n), y=n), data=by_target) +
  geom_bar(stat = 'identity') + ggtitle('Attack Targets, 2016') +
  coord_flip() + theme_fivethirtyeight()

grid.arrange(p1, p2, nrow = 1)

The diversity of terrorist attack targets is higher in 2016 than in 1970. The biggest difference is that private citizens and properties become the highest terrorist attack targets in 2016, which does not even appear among the top 10 terrorist attack targets in 1970. This may because the personal hatred is developing rapidly among these years.

For the top 10 terrorist attack targets, they require the most homeland security resources. Policymakers can use these and other results to focus their counter-terrorism measures.

4.4 Which regions were the most dangerous in 1970-1974 and 2011-2016

Since we have already explored terrorist attack methods and targets in previous analyzation, we also want to know which regions have been targetted the most.

#1970-1974
region_1970_1974 <- terrorism1 %>% group_by(region_txt) %>% summarise(n=n())
p1<-ggplot(aes(x=reorder(region_txt, n), y=n), data=region_1970_1974) + geom_bar(stat = 'identity') + xlab('Region') + ylab('Number of Terrorist Attacks From 1970 To 1974') + ggtitle('Terrorist Attacks Region From 1970 to 1974') + coord_flip()

#2012-2016
region_2012_2016 <- terrorism2 %>% group_by(region_txt) %>% summarise(n=n())
p2<-ggplot(aes(x=reorder(region_txt, n), y=n), data=region_2012_2016) + geom_bar(stat = 'identity') + xlab('Region') + ylab('Number of Terrorist Attacks From 2011 To 2016') + ggtitle('Terrorist Attacks Region From 2012 to 2016') + coord_flip()

grid.arrange(p1, p2, nrow = 2)

The region of terrorist attacks has also changed a lot in the past decades. During 1970-1974, the region with most terrorist attacks was Western Europe, followed by North America, South America, Middle East & North Afica etc,. While during 2012-2016, Middle East & North Afica has become the region with most number of terrorist attacks, followed by South Asia, Sub-Saharan African, Southeast Asia etc,. The change may cause by the unstable political or other situation in the Middle East and North Afica region and the dramatic change in the world pattern in the past 30 years.

4.5 Which cities were the most dangerous in 1970-1974 and 2012-2016

Then, we want to explore deeper to see the difference in the most dangerous cities in 1970-1974 and 2012-2016. They ranked among the world’s most dangerous places on the basis of total number of terrorism attacks.

attack_by_city <- terrorism1 %>% group_by(country_txt, city) %>% 
  summarise(n=n())
attack_by_city <- arrange(attack_by_city, desc(n))
top10_city<- head(attack_by_city, 10)

p1<-ggplot(data=top10_city, aes(x=reorder(city,-n), y=n, group=1)) +
  geom_line()+
  geom_point() + xlab("City") + ylab("Number of terrorist Attacks") +
        ggtitle("Top 10 dangerous city by Year 1970-1974") + theme_fivethirtyeight()

attack_by_city <- terrorism2 %>% group_by(country_txt, city) %>% 
  summarise(n=n())
attack_by_city <- arrange(attack_by_city, desc(n))
top10_city<- head(attack_by_city, 10)

p2<-ggplot(data=top10_city, aes(x=reorder(city,-n), y=n, group=1)) +
  geom_line()+
  geom_point() + xlab("City") + ylab("Number of terrorist Attacks") +
        ggtitle("Top 10 dangerous city by Year 2012-2016") + theme_fivethirtyeight()
grid.arrange(p1, p2, nrow = 2)

The most dangeours cities are very different in these two periods. During 1970-1974, the most dangerous cities are in United Kingdom, United States, Argentina, Uruguay and Turkey. Belfast was the most dangerous city in 1970-1974, with approximately 400 terrorist attacks in all.

During 2012-2016, the most dangerous cities are in Iraq, Pakistan, Somalia, Afghanistan, Libya and Yemen. Baghdad was the most dangerous city in 2012-2016, with approximately 4000 terrorist attacks in all. In addition, the total number of attacks increased rapidly, which may be caused by wars and other political issues.

4.6 Terrorist Attacks distribution by Region and Year 1970-1974 & 2012-2016

We also feel curious about the distribution worldwide by region in each year of both of the time ranges,‘1970-1974’and’2012-2016’. On the basis of this series of barcharts, we can know more about the questions like: ‘if the terrorist attack distribution of each year inside each time range is similar with each other?’ or ‘if there is any major differences between the 2 time ranges in terrorrist attacks distribution by region?’ To solve these problems, we built up the barcharts below.

#terrorism1970 <- read.csv('../data/terrorism1.csv', stringsAsFactors = F)
#terrorism2012 <- read.csv('../data/terrorism2.csv', stringsAsFactors = F)
combine <- subset(terrorism, iyear<=1974 | iyear>=2012)
by_year <- combine %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 1970-1974 & 2012-2016') + 
  theme(legend.position="none")

by_year <- terrorism1 %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 1970-1974') + 
  theme(legend.position="none")

by_year <- terrorism2 %>% group_by(iyear,region_txt) %>% 
  summarise(n=n())
ggplot(by_year, aes(x = region_txt, y = n, fill = n)) + 
  geom_bar(stat = 'identity') +
  facet_wrap(~iyear) + xlab('Year') + coord_flip() +
  ggtitle('Terrorist Attacks by Region and Year 2012-2016') + 
  theme(legend.position="none")

From the graphs above, we can conclude that compared with 40 years ago, the world is obviuosly more dangerous. In the era of 1970s, America and Western Europe are the most dangerous regions throughout the world, while in the ear of 2010s, Middle East, North Africa and South Asia already became the target regions of terrorrists, which means, the patterns have been subverted completely.

Simultaneously, we plotted pie charts to compare the contribution of each region for the occurance of terrorist attacks in the world.

NAmericadist <- terrorism1 %>% group_by(region_txt) %>% 
  summarise(n=n())

bp<- ggplot(NAmericadist, aes(x="", y=n, fill=region_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Region and Year 1970-1974') 

MENAdist <- terrorism2 %>% group_by(region_txt) %>% 
  summarise(n=n())
bp<- ggplot(MENAdist, aes(x="", y=n, fill=region_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Region and Year 2012-2016') 

The pie charts above show that coampared with 40 years ago, the target regions of terrorist attacks became more diverse, and the areas involved are more extensive.

4.7 Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 1970-1974

On the ground of the analysis above, we want ot pay more attention to 2 major regions full of collisions during the past years, which are Middle East & North Africa and North America. Firstly we want to compare the numbers of attacks of the 2 regions in 1970-1974.

NA_MENA <- subset(terrorism1, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(region_txt, iyear) %>% 
  summarise(n=n())
ggplot(n_NAMENA, aes(x = iyear, y = n, fill = region_txt)) +
  geom_col() +
  xlab('Year') +
  facet_wrap(~ region_txt) +
  theme(legend.position = "none") +
ggtitle("Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 1970-1974")

And this is the comparison during the period of 2012-2016.

NA_MENA <- subset(terrorism2, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(region_txt, iyear) %>% 
  summarise(n=n())
ggplot(n_NAMENA, aes(x = iyear, y = n, fill = region_txt)) +
  geom_col() +
  xlab('Year') +
  facet_wrap(~ region_txt) +
  theme(legend.position = "none") +
ggtitle("Comparison between Terrorist Attack numbers in Middle East & North Africa and North America by Year 2012-2016")

The contrast is corrsponding to the conclusion we got in the last part.

4.8 Types of Terrorist Attacks in North America and Middle East & North Africa by Year 1970-1974

Now that we know the distribution of attack numbers among years, we want to know more details about these terrorist activities. So the barchart below will show us the contribution of different attack types in a certain year and a certain region.

NorthAmerica <- terrorism1[terrorism1$region_txt=='North America', ]
NorthAmerica_type <- NorthAmerica %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = NorthAmerica_type) + geom_bar(stat = 'identity') +
ggtitle("Terrorist Attacks in North America by Year 1970-1974")

MENA <- terrorism1[terrorism1$region_txt=='Middle East & North Africa', ]
MENA_type <- MENA %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = MENA_type) + geom_bar(stat = 'identity') +
ggtitle("Terrorist Attacks in Middle East & North Africa by Year 1970-1974")

For North America in 1970s, explosion occupied a large proportion of all of the attack types, and there are also lots of infrastructure attacks. On the other hand, for Middle East & North Africa in 1970s, explosion is also a major type of attacks, but other main methods of attacks are armed assault, kidnapping and Hijacking,

4.9 Types of Terrorist Attacks in North America and Middle East & North Africa by Year 2012-2016

NorthAmerica <- terrorism2[terrorism2$region_txt=='North America', ]
NorthAmerica_type <- NorthAmerica %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = NorthAmerica_type) + geom_bar(stat = 'identity') +
ggtitle("Terrorist Attacks in North America by Year 1970-1974")

MENA <- terrorism2[terrorism2$region_txt=='Middle East & North Africa', ]
MENA_type <- MENA %>% group_by(attacktype1_txt, iyear) %>% 
  summarise(n=n())
ggplot(aes(x = iyear, y = n, fill = attacktype1_txt), data = MENA_type) + geom_bar(stat = 'identity') +
ggtitle("Terrorist Attacks in Middle East & North Africa by Year 2012-2016")

In comparison with 40 years ago, for North America, the percentage of explosion decreased, but infrastructure attack became the most major attack type. However, for Middle East & North Africa, the distribution has not changed a lot, and the quantity of explosion is overwelmingly large.

4.10 Comparison of Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 1970-1974&2012-2016

At present, we combine the 2 regions together to explore the contribution of each country in the 2 regions.

NA_MENA <- subset(terrorism1, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_NAMENA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 1970-1974') 

NA_MENA <- subset(terrorism2, region_txt == "North America" | region_txt == "Middle East & North Africa")
n_NAMENA <- NA_MENA %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_NAMENA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America& Middle East & North Africa and Year 2012-2016') 

It is self-evident that Iraq replaced the status of America to become the most dangerous country amoung North America& Middle East & North Africa. This probabily because of the influence of the Iraq war, and we can induce that most of risks are transferred to Middle East from United States.

4.11 Comparison of Terrorist Attacks Distribution by Country in North America and Year 1970-1974 & 2012-2016

On the basis of this, we want to concentrate on the 2 regions separately and analyse the relative safety of countries in some region.

NAmerica <- subset(terrorism1, region_txt == "North America")
n_NA <- NAmerica %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_NA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America and Year 1970-1974') 

NAmerica <- subset(terrorism2, region_txt == "North America")
n_NA <- NAmerica %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_NA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in North America and Year 2012-2016') 

According to the pie charts, we found that although United States is still the most major attack target in North America, but the risks of Mexico and Canada increased evidently compared with 40 years ago. But all in all, Canada is still the safest country in North America.

4.12 Comparison of Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 1970-1974 & 2012-2016

NMANA <- subset(terrorism1, region_txt == "Middle East & North Africa")
n_MANA <- NMANA %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_MANA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 1970-1974') 

NMANA <- subset(terrorism2, region_txt == "Middle East & North Africa")
n_MANA <- NMANA %>% group_by(country_txt) %>% 
  summarise(n=n())
bp<- ggplot(n_MANA, aes(x="", y=n, fill=country_txt))+
geom_bar(width = 1, stat = "identity")
pie <- bp + coord_polar("y", start=0)
pie + theme(axis.text.x=element_blank(), axis.title.x = element_blank(),
  axis.title.y = element_blank()) + ggtitle('Terrorist Attacks Distribution by Country in Middle East & North Africa and Year 2012-2016') 

For countries in Middle East & North Africa, the risks and contradictions in this region became more concentrated, which means, after several wars in this region during the past decades, great changes of security situations have taken place Middle East & North Africa.

5 Executive Summary (focus on quality of presentation choices / techniques)

6 Interactive Component

7 Conclusion